This document demonstrates how to perform clustering in Python using the scikit-learn library. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the iris dataset for this demonstration.
2 Load Data
First, we load the necessary libraries and the iris dataset.
Code
import pandas as pdfrom sklearn.cluster import KMeans, AgglomerativeClusteringfrom sklearn.datasets import load_irisfrom sklearn.preprocessing import StandardScalerimport matplotlib.pyplot as pltimport seaborn as sns# Load the iris datasetiris = load_iris()iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)# Standardize the datascaler = StandardScaler()scaled_iris = scaler.fit_transform(iris_df)
3 K-Means Clustering
K-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.
Hierarchical clustering is another common clustering method.
Code
hierarchical = AgglomerativeClustering(n_clusters=3)iris_df['hierarchical_cluster'] = hierarchical.fit_predict(scaled_iris)# Visualize the clustersplt.figure(figsize=(10, 6))sns.scatterplot(x=scaled_iris[:, 0], y=scaled_iris[:, 1], hue=iris_df['hierarchical_cluster'], palette='viridis', s=100)plt.title('Hierarchical Clustering of Iris Data')plt.xlabel(iris..feature_names[0])plt.ylabel(iris.feature_names[1])plt.show()
5 Conclusion
This document provided a brief overview of clustering in Python using scikit-learn. We demonstrated both K-Means and Hierarchical clustering on the iris dataset.
Source Code
---title: "Clustering in Python"execute: warning: false error: false eval: falseformat: html: toc: true toc-location: right code-fold: show code-tools: true number-sections: true code-block-bg: true code-block-border-left: "#31BAE9"---## IntroductionThis document demonstrates how to perform clustering in Python using the `scikit-learn` library. Clustering is an unsupervised learning technique that groups similar data points together based on their inherent characteristics. We will use the `iris` dataset for this demonstration.## Load DataFirst, we load the necessary libraries and the `iris` dataset.```{python}#| label: load-data#| echo: trueimport pandas as pdfrom sklearn.cluster import KMeans, AgglomerativeClusteringfrom sklearn.datasets import load_irisfrom sklearn.preprocessing import StandardScalerimport matplotlib.pyplot as pltimport seaborn as sns# Load the iris datasetiris = load_iris()iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)# Standardize the datascaler = StandardScaler()scaled_iris = scaler.fit_transform(iris_df)```## K-Means ClusteringK-Means is a popular clustering algorithm. We will use it to group the iris data into 3 clusters.```{python}#| label: kmeans#| echo: truekmeans = KMeans(n_clusters=3, random_state=42, n_init=10)iris_df['kmeans_cluster'] = kmeans.fit_predict(scaled_iris)# Visualize the clustersplt.figure(figsize=(10, 6))sns.scatterplot(x=scaled_iris[:, 0], y=scaled_iris[:, 1], hue=iris_df['kmeans_cluster'], palette='viridis', s=100)plt.title('K-Means Clustering of Iris Data')plt.xlabel(iris.feature_names[0])plt.ylabel(iris.feature_names[1])plt.show()```## Hierarchical ClusteringHierarchical clustering is another common clustering method.```{python}#| label: hclust#| echo: truehierarchical = AgglomerativeClustering(n_clusters=3)iris_df['hierarchical_cluster'] = hierarchical.fit_predict(scaled_iris)# Visualize the clustersplt.figure(figsize=(10, 6))sns.scatterplot(x=scaled_iris[:, 0], y=scaled_iris[:, 1], hue=iris_df['hierarchical_cluster'], palette='viridis', s=100)plt.title('Hierarchical Clustering of Iris Data')plt.xlabel(iris..feature_names[0])plt.ylabel(iris.feature_names[1])plt.show()```## ConclusionThis document provided a brief overview of clustering in Python using `scikit-learn`. We demonstrated both K-Means and Hierarchical clustering on the `iris` dataset.